A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data

نویسندگان

  • Muhammad Abulaish
  • Tarique Anwar
چکیده

Due to increasing accumulation of textual data on the World Wide Web, tag cloud has become an effective tool to quickly perceive the most prominent terms embedded within them. Tag clouds help to grasp the main theme of a corpus without exploring the pile of documents. However, the effectiveness of tag clouds to conceptualize text corpora is directly proportional to the quality of the tags extracted from them. In this paper, we propose a keyphrase-based tag cloud generation framework to conceptualize textual data. In contrast to existing tag cloud generation systems that use single words as tags and their frequency counts to determine the font size of the tags, the proposed framework identifies feasible keyphrases and uses them as tags. The font-size of a keyphrase is determined as a function of its relevance weight. Instead of using partial or full parsing, which is inefficient for lengthy sentences and inaccurate for the sentences that do not follow proper grammatical structure, the proposed method applies n-gram techniques followed by various heuristics-based refinements to identify candidate phrases from text documents. A rich set of lexical and semantic features are identified to characterize the candidate phrases and determine their keyphraseness and relevance weights. We also propose a font-size determination function, which utilizes the relevance weights of the keyphrases to determine their relative font size for tag cloud visualization. The efficacy of the proposed framework is established through experimentation and its comparison with the existing state-of-the-art tag cloud generation methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyphrase Cloud Generation of Broadcast News

This paper describes an enhanced automatic keyphrase extraction method applied to Broadcast News. The keyphrase extraction process is used to create a concept level for each news. On top of words resulting from a speech recognition system output and news indexation and it contributes to the generation of a tag/keyphrase cloud of the top news included in a Multimedia Monitoring Solution system f...

متن کامل

Semantic Tag Cloud Generation via DBpedia

Many current recommender systems exploit textual annotations (tags) provided by users to retrieve and suggest online contents. The text-based recommendation provided by these systems could be enhanced (i) using unambiguous identifiers representative of tags and (ii) exploiting semantic relations among tags which are impossible to be discovered by traditional textual analysis. In this paper we c...

متن کامل

Tag Cloud Reorganization: Finding Groups of Related Tags on Delicious

Tag clouds have become an appealing way of navigating through web pages on social tagging systems. Recent research has focused on finding relations among tags to improve visualization and access to web documents from tag clouds. Reorganizing tag clouds according to tag relatedness has been suggested as an effective solution to ease navigation. Most of the approaches either rely on co-occurrence...

متن کامل

TEXTUAL AND INTER-TEXTUAL ANALYSES OF IRANIAN EFL UNDERGRADUATES’ TYPES OF ENGLISH READING TOWARDS DEVELOPING A CAREFUL READING FRAMEWORK

This study investigated textual and inter-textual reading of a group of Iranian EFL undergraduates’ careful English reading types. In this research, Khalifa and Weir’s (2009) reading framework was used to propose a more inclusive aspect of a careful reading framework and the reading construct for instructional and assessment goals. The participants of this study were B.A. students of English Tr...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJARAS

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2013